|
Parallel Thread Execution (PTX) is a pseudo-assembly language used in Nvidia's CUDA programming environment. The nvcc compiler translates code written in CUDA, a C-like language, into PTX, and the graphics driver contains a compiler which translates the PTX into a binary code which can be run on the processing cores. == Registers == PTX uses an arbitrarily large register set; the output from the compiler is almost pure single-assignment form, with consecutive lines generally referring to consecutive registers. Programs start with declarations of the form .reg .u32 %r<335>; // declare 335 registers %r0, %r1, ..., %r334 of type unsigned 32-bit integer It is a three-argument assembly language, and almost all instructions explicitly list the data type (in terms of sign and width) on which they operate. Register names are preceded with a % character and constants are literal, e.g.: shr.u64 %rd14, %rd12, 32; // shift right an unsigned 64-bit integer from %rd12 by 32 positions, result in %rd14 cvt.u64.u32 %rd142, %r112; // convert an unsigned 32-bit integer to 64-bit There are predicate registers, but compiled code in shader model 1.0 uses these only in conjunction with branch commands; the conditional branch is @%p14 bra $label; // branch to $label The setp.cc.type instruction sets a predicate register to the result of comparing two registers of appropriate type, there is also a set instruction, where set.le.u32.u64 %r101, %rd12, %rd28 sets the 32-bit register %r101 to 0xffffffff if the 64-bit register %rd12 is less than or equal to the 64-bit register %rd28. Otherwise %r101 is set to 0x00000000. There are a few predefined identifiers that denote pseudoregisters. Among others, %tid, %ntid, %ctaid, and %nctaid contain, respectively, thread indices, block dimensions, block indices, and grid dimensions.〔(PTX ISA Version 2.3 )〕 抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)』 ■ウィキペディアで「Parallel Thread Execution」の詳細全文を読む スポンサード リンク
|